Embrace conda packages

The build system we always needed, but never deserved

Juan Luis Cano Rodríguez
Madrid, 2016-04-08

Outline

  • Introduction
  • Motivation: What brought us here?
  • Our first conda package
  • Some more tricks
  • Working with other languages
  • conda-forge: a community repository
  • Limitations and future work
  • Conclusions

Who is this guy?

  • Almost Aerospace Engineer
  • Quant Developer for BBVA at Indizen (yeah, lots of Python there!)
  • Writer and furious tweeter at Pybonacci
  • Chair and BDFL of Python España
  • Co-creator and charismatic leader of AeroPython (*not the Lorena Barba course)
  • When time permits (rare) writes some open source Python code

You know, I've been giving talks on Python and its scientific ecosystem for about three years now... And I always write this bit there, that "Almost" word in italics before my background. You may reasonably wonder now what the heck I've been doing all these years to always introduce myself as an "almost" Aerospace Engineer, right? Well, I promise that I'm taking the required steps to graduate not later than this Autumn, but anyway this talk reflects one of the severe pains I've been going through while carrying my final project.

Motivation: What brought us here?

Let's begin with some questions:

  • Who writes Python code here, either for a living or for fun?
  • Who can write a setup.py... without copying a working one from the Internet?
  • How many Linux users... can configure a Visual Studio project properly?
  • How many of you are using Anaconda... because it was the only way to survive?

...or: "The sad state of scientific software"

Some inconvenient truths:

Portability is hard (unless you stick to pure Python)

Properly distributing software libraries is very hard

Result:

What horror have we created

If you’re missing a library or program, and that library or program happens to be written in C, you either need root to install it from your package manager, or you will descend into a lovecraftian nightmare of attempted local builds from which there is no escape. You say you need lxml on shared hosting and they don’t have libxml2 installed? Well, fuck you.

— Eevee, "The sad state of web app deployment"

Are virtual machines and containers the solution?

"It's easy to build a VM if you automate the install process, and providing that install script for even one OS can demystify the install process for others; conversely, just because you provide a VM doesn't mean that anyone other than you can install your software"

— C. Titus Brown, "Virtual machines considered harmful for reproducibility"

Our first conda package

Let's install conda-build!


In [1]:
!conda install -y conda-build -q -n root


Using Anaconda Cloud api site https://api.anaconda.org
Fetching package metadata: ........
Solving package specifications: .........

# All requested packages already installed.
# packages in environment at /home/juanlu/.miniconda3:
#
conda-build               1.20.0                   py34_0  

conda packages are created from conda recipes. We can create a bare recipe using conda skeleton to build it from a PyPI package.


In [3]:
!conda skeleton pypi pytest-benchmark > /dev/null


Using Anaconda Cloud api site https://api.anaconda.org

In [4]:
!ls pytest-benchmark


bld.bat  build.sh  meta.yaml

These are the minimum files for the recipe:

  • meta.yaml contains all the metadata
  • build.sh and bld.bat are the build scripts for Linux/OS X and Windows respectively

The meta.yaml file

It contains the metadata in YAML format.

  • package, source and build specify the name, version and source of the package
  • requirements specify the build (install time) and run (runtime) requirements
  • test specify imports, commands and scripts to test
  • about adds some additional data for the package

In [26]:
!grep -v "#" pytest-benchmark/meta.yaml | head -n24


package:
  name: pytest-benchmark
  version: "3.0.0"

source:
  fn: pytest-benchmark-3.0.0.zip
  url: https://pypi.python.org/packages/source/p/pytest-benchmark/pytest-benchmark-3.0.0.zip
  md5: f8ab8e438f039366e3765168ad831b4c

build:
  preserve_egg_dir: True



requirements:
  build:
    - python
    - setuptools
    - pytest >=2.6

  run:
    - python
    - setuptools
    - pytest >=2.6

The build.sh and bld.bat files

They specify how to build the package.


In [28]:
!cat pytest-benchmark/build.sh


#!/bin/bash

$PYTHON setup.py install

# Add more build steps here, if they are necessary.

# See
# http://docs.continuum.io/conda/build.html
# for a list of environment variables that are set during the build process.

In [30]:
!grep -v "::" pytest-benchmark/bld.bat


"%PYTHON%" setup.py install
if errorlevel 1 exit 1


The build process

Adapted from http://conda.pydata.org/docs/building/recipe.html#conda-recipe-files-overview

  1. Downloads the source
  2. Applies patches (if any)
  3. Install build dependencies
  4. Runs the build script
  5. Packages new files
  6. Run tests against newly created package

Seems legit!


In [32]:
!conda build pytest-benchmark --python 3.5 > /dev/null  # It works!


Using Anaconda Cloud api site https://api.anaconda.org
+ /home/juanlu/.miniconda3/envs/_build/bin/python setup.py install
warning: no directories found matching 'examples'
warning: no files found matching '.isort.cfg'
warning: no files found matching '.pylintrc'
warning: no previously-included files matching '*.py[cod]' found anywhere in distribution
warning: no previously-included files matching '__pycache__' found anywhere in distribution
warning: no previously-included files matching '*.so' found anywhere in distribution
warning: no previously-included files matching '*.dylib' found anywhere in distribution

In [33]:
!ls ~/.miniconda3/conda-bld/linux-64/pytest-benchmark-3.0.0-py35_0.tar.bz2


/home/juanlu/.miniconda3/conda-bld/linux-64/pytest-benchmark-3.0.0-py35_0.tar.bz2

In [35]:
!conda install pytest-benchmark --use-local --yes


Using Anaconda Cloud api site https://api.anaconda.org
Fetching package metadata: ..........
Solving package specifications: .........

Package plan for installation in environment /home/juanlu/.miniconda3/envs/py35:

The following NEW packages will be INSTALLED:

    pytest-benchmark: 3.0.0-py35_0

Linking packages ...
[      COMPLETE      ]|###################################################| 100%

Build, test, upload, repeat

Let's upload the package first using anaconda-client:


In [38]:
!conda install anaconda-client --quiet --yes


Using Anaconda Cloud api site https://api.anaconda.org
Fetching package metadata: ........
Solving package specifications: .........

# All requested packages already installed.
# packages in environment at /home/juanlu/.miniconda3/envs/py35:
#
anaconda-client           1.4.0                    py35_0  

In [39]:
!anaconda upload ~/.miniconda3/conda-bld/linux-64/pytest-benchmark-3.0.0-py35_0.tar.bz2


Using Anaconda Cloud api site https://api.anaconda.org
detecting package type ...
conda
extracting package attributes for upload ...
done

Uploading file Juanlu001/pytest-benchmark/3.0.0/linux-64/pytest-benchmark-3.0.0-py35_0.tar.bz2 ... 
 uploaded 54 of 54Kb: 100.00% ETA: 0.0 minutes


Upload(s) Complete

Package located at:
https://anaconda.org/juanlu001/pytest-benchmark

And now, let's install it!


In [47]:
!conda remove pytest-benchmark --yes > /dev/null


Using Anaconda Cloud api site https://api.anaconda.org

In [48]:
!conda install pytest-benchmark --channel juanlu001 --yes


Using Anaconda Cloud api site https://api.anaconda.org
Fetching package metadata: ..........
Solving package specifications: .........

Package plan for installation in environment /home/juanlu/.miniconda3/envs/py35:

The following NEW packages will be INSTALLED:

    pytest-benchmark: 3.0.0-py35_0

Linking packages ...
[      COMPLETE      ]|###################################################| 100%

Some more tricks

Running the tests

You can run your tests with Python, Perl or shell scripts (run_test.[py,pl,sh,bat])

# run_test.sh cd $SRC_DIR/test cmake . make

Convert pure Python packages to other platforms

Using conda convert for pure Python packages, we can quickly provide packages for other platforms


In [3]:
!conda convert ~/.miniconda3/conda-bld/linux-64/pytest-benchmark-3.0.0-py35_0.tar.bz2 --platform all | grep Converting


Converting /home/juanlu/.miniconda3/conda-bld/linux-64/pytest-benchmark-3.0.0-py35_0.tar.bz2 from linux to osx-64
Converting /home/juanlu/.miniconda3/conda-bld/linux-64/pytest-benchmark-3.0.0-py35_0.tar.bz2 from linux to linux-32
Converting /home/juanlu/.miniconda3/conda-bld/linux-64/pytest-benchmark-3.0.0-py35_0.tar.bz2 from linux to linux-64
Converting /home/juanlu/.miniconda3/conda-bld/linux-64/pytest-benchmark-3.0.0-py35_0.tar.bz2 from linux to win-32
Converting /home/juanlu/.miniconda3/conda-bld/linux-64/pytest-benchmark-3.0.0-py35_0.tar.bz2 from linux to win-64

Platform-specific metadata

# from glpk build: features: - vc9 [win and py27] - vc10 [win and py34] - vc14 [win and py35] requirements: build: - gmp [linux or osx]

Templating for meta.yaml

Metadata files support templating using Jinja2!

# from glpk build: number: {{ environ.get("APPVEYOR_BUILD_NUMBER", 1) }} [win] # from poliastro at conda-forge {% set version = "0.5.0" %} package: name: poliastro version: {{ version }} source: fn: v{{ version }}.tar.gz url: https://github.com/poliastro/poliastro/archive/v{{ version }}.tar.gz

Working with other languages

or: conda as a cross-platform package manager

  • conda can be used to build software written in any language
  • Just don't include python as a build or run dependency!
  • It's already being used to distribute pure C and C++ libraries, R packages...
# build.sh from glpk export CFLAGS="-O3" ./configure --prefix=$PREFIX --with-gmp make check install

Important caveat:

The burden is on you

There be dragons

  • conda-build does not solve cross-compiling so you will need to build compiled packages on each platform
  • Regarding Linux, there are a lot of sources of binary incompatibility
    • Building on a clean operative system is key
    • Using an old version of Linux (CentOS 5?) also helps, because many core system libraries have strict backwards compatibility policies
    • Packages that assume everything is on root locations will fail to compile
    • Sometimes careful editing of compiler flags and event patching is necessary

If the recipe builds on a fresh, headless, old Linux it will work everywhere

conda-forge: a community repository

conda-forge is a github organization containing repositories of conda recipes. Thanks to some awesome continuous integration providers (AppVeyor, CircleCI and TravisCI), each repository, also known as a feedstock, automatically builds its own recipe in a clean and repeatable way on Windows, Linux and OSX.

Features:

  • Automatic linting of recipes
  • Continuous integration of recipes in Linux, OS X and Windows
  • Automatic upload of packages

What I love:

  • Having a blessed community channel (like Arch Linux AUR)
  • Ensuring recipes run everywhere
  • High quality standards!

Limitations and future work

conda (2012?) and conda-build (2013) are very young projects and still have some pain points that ought to be addressed

  • Support for gcc and libgfortran is not yet polished in Anaconda and there are still some portability issues

The state of Python packaging is improving upstream too!

  • pip builds and caches wheels locally - the problem of compiling NumPy over and over again was addressed a while ago
  • Windows and OS X wheels are easy to build and widely available for many scientific packages
  • PEP 0513 provides a way to finally upload Linux wheels to PyPI which are compatible with many Linux distributions
  • PEP 0516 proposes "a simple and standard sdist format that isn't intertwined with distutils"!!!1!

Still, there are some remaining irks:

Conclusion

Approach me during the conference, interrupt me while I'm on a conversation, ask me questions, let's talk about your ideas and projects! 😊

Thanks for yor attention!